-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(listing): pick exact or create new one on update #726
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
50cf27d
to
08e1e75
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #726 +/- ##
==========================================
- Coverage 87.43% 87.43% -0.01%
==========================================
Files 114 114
Lines 10967 10965 -2
Branches 1508 1507 -1
==========================================
- Hits 9589 9587 -2
Misses 998 998
Partials 380 380
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
08e1e75
to
73b68b9
Compare
73b68b9
to
4f66965
Compare
4f66965
to
8ffe1d4
Compare
8ffe1d4
to
f4b023c
Compare
src/datachain/lib/dc.py
Outdated
if listings and not update: | ||
listing = sorted(listings, key=lambda ls: ls.created_at)[-1] | ||
|
||
# For local file system we need to fix listing path / prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilongin Q: why do we need to do this only for the local file system?
@@ -38,20 +39,43 @@ def _tree_to_entries(tree: dict, path=""): | |||
@pytest.fixture | |||
def listing(test_session): | |||
catalog = test_session.catalog | |||
dataset_name, _, _, _ = DataChain.parse_uri("s3://whatever", test_session) | |||
dataset_name, _, _, _ = DataChain.parse_uri("file:///whatever", test_session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[C]: minor semi-related change - there is not reason to use s3
, tests were failing since it was hitting actual token update w/o proper fixtures. cc @ilongin
f4b023c
to
3bcaa78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Partially addresses and stabilizes #725
Fixes a few situations in the way we find and suggest an existing cached listing when we do an
update=True
.1. There was a bug in the origin code:
This code actually was picking the "biggest" one instead. There should have been
reverse=True
.So, in a situation like:
file:///whatever/dir1
exists and cachedfile:///whatever/
exists and cachedand we were trying to update the
file:///whatever/dir1
- it was updating thefile:///whatever/
instead.2. Then, in a situation like:
file:///whatever/
exists and cachedfile:///whatever/dir1
doesn't yet existand we do "update=True" for the
file:///whatever/dir1
it was updating the whole
file:///whatever/
which can be arbitrary large. And we don't want to do this.So, bottom line is:
Generalizing a bit code for both of those cases led to the code in this PR.